Automatically Identifying Computationally Relevant Typological Features
نویسندگان
چکیده
In this paper we explore the potential for identifying computationally relevant typological features from a multilingual corpus of language data built from readily available language data collected off the Web. Our work builds on previous structural projection work, where we extend the work of projection to building individual CFGs for approximately 100 languages. We then use the CFGs to discover the values of typological parameters such as word order, the presence or absence of definite and indefinite determiners, etc. Our methods have the potential of being extended to many more languages and parameters, and can have significant effects on current research focused on tool and resource development for low-density languages and grammar induction from raw corpora.
منابع مشابه
The shape and tempo of language evolution.
There are approximately 7000 languages spoken in the world today. This diversity reflects the legacy of thousands of years of cultural evolution. How far back we can trace this history depends largely on the rate at which the different components of language evolve. Rates of lexical evolution are widely thought to impose an upper limit of 6000-10,000 years on reliably identifying language relat...
متن کاملDiscriminative Analysis of Linguistic Features for Typological Study
We address the task of automatically estimating the missing values of linguistic features by making use of the fact that some linguistic features in typological databases are informative to each other. The questions to address in this work are (i) how much predictive power do features have on the value of another feature? (ii) to what extent can we attribute this predictive power to genealogica...
متن کاملTypology and Areality
Analysis of typological features has received much attention in recent years. Despite claims that such analysis replicates the families and subgroups of families that are the product of work using the comparative method, there is strong evidence that the geographic distribution of typological features reflects socio-geographically relevant areas, rather than historically related subgroups. Furt...
متن کاملLanguage, Emotion and Metapragmatics: A Theory Based on Typological Evidence
Humans are equipped with some universal or language-specific abilities to recognize emotions. However, because of the different emotional contents in diverse languages and the relevant cultural differences, humans with different cultural backgrounds own different metapragmatical abilities to recognize and express emotions. A hypothesis concerning emotional effects about intonation and particle ...
متن کاملTracking Typological Traits of Uralic Languages in Distributed Language Representations
Although linguistic typology has a long history, computational approaches have only recently gained popularity. The use of distributed representations in computational linguistics has also become increasingly popular. A recent development is to learn distributed representations of language, such that typologically similar languages are spatially close to one another. Although empirical successe...
متن کامل